Python Library Presentation

NetworkX

image.png

Presented by:
Mary Kryslette Bunyi
10 November 2021

Network Analysis

Networks are manifested in a variety of fields such as social media, global value chains, disease outbreaks, and internet browsing. Network analysis may allow us to:

  • determine important actors/pieces within an organization/system
  • identify clusters and analyze interconnectivity within a network
  • predict a network's future direction
    Political blogs prior to the 2004 US Presidential election
    image.png

Graph Network

  • Nodes: entities of interest
    • people, organizations, concepts, database tables/relations
  • Edges: relationships between the nodes
    • may be directional, may represent different types of links within the same network, and may carry weight
    • financial transactions, workplace hierarchy, or shared keys between database tables
      Simple network graph + Node and Edge lists
      image.png

NetworkX

  • creating, manipulating, visualizing, and studying the structure, dynamics, and functions of complex networks
  • can handle up to 10 million nodes and 100 million edges

Functionalities:

  • generate random and classic networks
  • analyze network structure
  • build network models
  • design new network algorithms
  • draw networks

Note: Primarily for graph analysis (not graph viz)

  • basic drawing functionalities using Matplotlib
  • appropriate for simpler networks or for exploratory data analysis
  • for more advanced graph viz, use dedicated fully-featured tools like Graphviz

image.png

Installation

NetworkX requires Python 3.7 or newer.

To install the latest release of the package, run pip install networkx[default].

To install the package without the dependencies (e.g., numpy, scipy), run pip install networkx.

Alternatively, manual downloads are also possible through Network's GitHub or PyPI repositories.

Simple Plotting

Graph creation is fundamentally comprised of 5 steps:

  1. NetworkX package import:     import networkx as nx
  2. Create the graph object:     g = nx.Graph()
  3. Add nodes*:      g.add_node(node)
  4. Add edges:      g.add_edge(node_1, node_2)
  5. Draw the graph:     nx.draw(g)

Note: Step 3 may be skipped as nodes will automatically be created when edges are created between non-existent nodes.

In [14]:
# Create a networkx graph object
my_graph = nx.Graph() 
 
# Add edges to to the graph object
# Each tuple represents an edge between two nodes
my_graph.add_edges_from([
                        (1,2), 
                        (1,3), 
                        (3,4), 
                        (1,5), 
                        (3,5),
                        (4,2),
                        (2,3),
                        (3,0)])
 
# Draw the resulting graph
nx.draw(my_graph, with_labels=True, font_weight='bold')

Importing Network Data from a DataFrame (using from_pandas_edgelist)

In [15]:
# create a dataframe
df = pd.DataFrame({'from': ['A', 'B', 'C', 'A'], 
                   'to': ['D', 'A', 'E', 'C']})
# create graph object
G = nx.from_pandas_edgelist(df, 'from', 'to')
# plot the network graph
nx.draw(G, with_labels=False, node_size=500, alpha=1, linewidths=20)

Tweaking node coordinates (using pos) and graph elements (matplotlib integration)

In [16]:
# create nodes and edges
G = nx.Graph()
G.add_edge(1, 2)
G.add_edge(1, 3)
G.add_edge(1, 5)
G.add_edge(2, 3)
G.add_edge(3, 4)
G.add_edge(4, 5)

# set positions
pos = {1: (0, 0), 2: (-1, 0.3), 3: (2, 0.17), 4: (4, 0.255), 5: (5, 0.03)}

options = {
    "font_size": 36,
    "node_size": 3000,
    "node_color": "white",
    "edgecolors": "black",
    "linewidths": 5,
    "width": 5}

nx.draw_networkx(G, pos, **options)

# Set margins for the axes so that nodes aren't clipped
ax = plt.gca()
ax.margins(0.20)
plt.axis("off")
plt.show()

Directed (i.e., Directional) graph (DiGraph)

The source node must be specified before the target node

In [17]:
# create graph object
G = nx.DiGraph([(0, 3), (1, 3), (2, 4), (3, 5), (3, 6), (4, 6), (5, 6)])

# group nodes by column
left_nodes = [0, 1, 2]
middle_nodes = [3, 4]
right_nodes = [5, 6]

# set the position according to column (x-coord)
pos = {n: (0, i) for i, n in enumerate(left_nodes)}
pos.update({n: (1, i + 0.5) for i, n in enumerate(middle_nodes)})
pos.update({n: (2, i + 0.5) for i, n in enumerate(right_nodes)})

nx.draw_networkx(G, pos, **options)

# Set margins for the axes so that nodes aren't clipped
ax = plt.gca()
ax.margins(0.20)
plt.axis("off")
plt.show()

Crude Visualization Exercise

In [18]:
# import data
msn = pd.read_csv("https://github.com/mkbunyi/Data-Viz-Tutorial-NetworkX/raw/main/marvel_social_network.csv")
msn.head()
Out[18]:
Character Name Line ID Path Ch Name Diff Character ID Character Name (copy) Main Character Name in Caps Relation Relation - Negative ... Relation Sentiment Relationship Set ID Synergy Type URL 1 Number of Records S.No. X Y
0 Abomination 1 1 NaN 1 Abomination Abomination ABOMINATION Friends NaN ... Positive NaN 1 Reciprocal NaN 1 1 1 6581.620117 2663.285156
1 Rhino 1 2 Rhino 83 Rhino Abomination RHINO Friends NaN ... Positive Relationship with Abomination - 1 Reciprocal NaN 1 1 2 7942.243652 9485.032227
2 Abomination 2 1 NaN 1 Abomination Abomination ABOMINATION Nemesis NaN ... Negative NaN 1 Perfect NaN 1 1 3 6581.620117 2663.285156
3 Hulk 2 2 Hulk 45 Hulk Abomination HULK Nemesis Nemesis ... Negative Relationship with Abomination - 1 Perfect NaN 1 1 4 4651.433594 5910.624023
4 Abomination 3 1 NaN 1 Abomination Abomination ABOMINATION Enemies NaN ... Negative NaN 1 Normal NaN 1 1 5 6581.620117 2663.285156

5 rows × 25 columns

Reformat the data from "long" to "wide" format
In [19]:
# reformat to combine linked characters in 1 row

# collapse rows by line ID and combine linked characters in a list per cell
msn_nx = msn.groupby('Line ID').agg(lambda x: x.tolist())
msn_nx = msn_nx[["Character Name","Character ID"]]

# split list and allocate separate columns for the linked characters
msn_nx = pd.concat([msn_nx["Character Name"].apply(pd.Series),
          msn_nx["Character ID"].apply(pd.Series)],
          axis=1).reset_index()        

# add relation type
msn_nx = msn_nx.merge(msn[["Line ID","Relation","Relation Sentiment"]],
            on="Line ID", how = "left").drop_duplicates()

# rename columns
msn_nx.columns = ['Line ID', 'Char1_Name', 'Char2_Name', 'Char1_ID', 'Char2_ID', 'Relation', 'Relation Sentiment']

# reset index
msn_nx = msn_nx.reset_index()

# set color according to relation sentiment
msn_nx['color'] = np.where(msn_nx['Relation Sentiment']=="Positive",
                          "green",
                          "black")
msn_nx['color'] = np.where(msn_nx['Relation Sentiment']=="Negative",
                          "red",
                          msn_nx['color'])
In [11]:
# view reformatted data
msn_nx.head()
Out[11]:
index Line ID Char1_Name Char2_Name Char1_ID Char2_ID Relation Relation Sentiment color
0 0 1 Abomination Rhino 1 83 Friends Positive green
1 2 2 Abomination Hulk 1 45 Nemesis Negative red
2 4 3 Abomination She-Hulk 1 90 Enemies Negative red
3 6 4 Abomination Red Hulk 1 82 Enemies Negative red
4 8 5 Abomination King Groot 1 59 Friends Positive green
Create graph object
In [9]:
# Initialize a graph object
G = nx.from_pandas_edgelist(msn_nx,
                            'Char1_Name', 
                            'Char2_Name',
                            edge_attr=["Relation","Relation Sentiment"])
In [10]:
# Generate layout for visualization
pos = nx.kamada_kawai_layout(G)
In [11]:
# Manual position tweaking
pos["Captain America"] += (0, -1)

Tweaking node size

We will set node size proportional to the number of links.

In [12]:
# node size is proportional to number of links
links=dict.fromkeys(G.nodes(),0.0)
for (node1,node2,attrib) in G.edges(data=True):
    links[node1]+=1
    links[node2]+=1
In [15]:
fig, ax = plt.subplots(figsize=(40, 40))

# draw edges
nx.draw_networkx_edges(G, pos, alpha=1, width=5,
                       edge_color=[msn_nx["color"][i] for i in list(range(len(msn_nx)))])

# draw nodes
nx.draw_networkx_nodes(G, pos,
                       node_size = [links[i]*500 for i in G],
                       node_color="blue", alpha=1,
                       label=[msn_nx["Char1_Name"][i] for i in list(range(len(msn_nx)))])

# draw labels
label_options = {"ec": "black", "fc": "white", "alpha": .9}
nx.draw_networkx_labels(G, pos, font_size=30, bbox=label_options)

# display title
font = {"color": "black", "fontweight": "bold", "fontsize": 40}
ax.set_title("Marvel Social Network", font)

# Resize figure for label readibility
ax.margins(0.1, 0.05)
fig.tight_layout()
plt.axis("off")
plt.show()

Subsetting and NetworkX's graph analysis functions

In [18]:
# initialize
top_links = {}

# iterate through nodes to count connections
for char in G.nodes:
    top_links[char] = len(G[char])
    
# convert to dataframe
s = pd.Series(top_links, name='connections')
df = s.to_frame().sort_values('connections', ascending=False)
df
Out[18]:
connections
Black Widow 20
Hulk 20
Wolverine 19
Spider-Man (Classic) 18
Iron Man 18
... ...
King Groot 4
Blade 4
Killmonger 4
Punisher 2099 3
Psylocke 3

118 rows × 1 columns

In [19]:
msn_blackwidow = msn_nx[msn_nx["Char1_Name"]=="Black Widow"].reset_index()
msn_blackwidow.head
Out[19]:
level_0 index Line ID Char1_Name Char2_Name Char1_ID Char2_ID Relation Relation Sentiment color
0 80 160 81 Black Widow Archangel 10 5 Teammates Positive green
1 81 162 82 Black Widow Black Panther (Civil War) 10 9 Friends Positive green
2 82 164 83 Black Widow Captain Marvel 10 15 Friends Positive green
3 83 166 84 Black Widow Crossbones 10 19 Rivals Negative red
4 84 168 85 Black Widow Daredevil (Classic) 10 23 Romance Positive green
5 85 170 86 Black Widow Elektra 10 32 Rivals Negative red
6 86 172 87 Black Widow Falcon 10 33 Enemies Negative red
7 87 174 88 Black Widow Hawkeye 10 41 Romance Positive green
8 88 176 89 Black Widow Hulk 10 45 Avengers Positive green
9 89 178 90 Black Widow Hulk (Ragnarok) 10 46 Lullaby Positive green
10 90 180 91 Black Widow Hulkbuster 10 47 Avengers Positive green
11 91 182 92 Black Widow Iceman 10 49 Teammates Positive green
12 92 184 93 Black Widow Ms. Marvel 10 72 Friends Positive green
13 93 186 94 Black Widow Quake 10 81 S.H.I.E.L.D Clearance Neutral black
14 94 188 95 Black Widow Sentry 10 89 Friends Positive green
15 95 190 96 Black Widow Thor (Jane Foster) 10 102 Friends Positive green
16 96 192 97 Black Widow Ultron 10 104 Enemies Negative red
17 97 194 98 Black Widow Void 10 111 Overcoming Fear Neutral black
18 98 196 99 Black Widow War Machine 10 113 Teammates Positive green
19 99 198 100 Black Widow Winter Soldier 10 114 Romance Positive green
Visualization of Black Widow's network

We will use the spring layout, which will put Black Widow in the middle of the graph.

In [20]:
# initialize plot
fig, ax = plt.subplots(figsize=(10, 8))

# Initialize a graph object
G = nx.from_pandas_edgelist(msn_blackwidow,
                            'Char1_Name', 
                            'Char2_Name',
                            edge_attr=["Relation","Relation Sentiment"])

# Draw using a spring layout
nx.draw_spring(G,with_labels=True,
               edge_color=[msn_blackwidow["color"][i] for i in list(range(len(msn_blackwidow)))],
               node_color = "gainsboro",
              node_size = 2000,
              font_size=11)

# Resize figure for label readibility
fig.tight_layout()
plt.axis("off")
plt.margins(x=0.4)
plt.show()

Other use cases

Connected components

  • Cluster groups according to feature similarities (e.g., same mobile number or address)
  • Webs of accounts or channels used by criminal groups
  • Transportation clusters
In [21]:
# load in data
cities = pd.read_csv("https://github.com/mkbunyi/Data-Viz-Tutorial-NetworkX/raw/main/distances.csv")
cities.head()
Out[21]:
node1 node2 distance
0 Mannheim Frankfurt 85
1 Mannheim Karlsruhe 80
2 Erfurt Wurzburg 186
3 Munchen Numberg 167
4 Munchen Augsburg 84
In [22]:
# create graph object along with nodes and edges
g = nx.Graph()
for edge in range(len(cities)):
    g.add_edge(cities["node1"][edge],
               cities["node2"][edge], 
               weight = cities["distance"][edge])

connected_components function to identify distinct sub-groups

In [23]:
for i, x in enumerate(nx.connected_components(g)):
    print("cc"+str(i)+":",x)
cc0: {'Erfurt', 'Wurzburg', 'Mannheim', 'Frankfurt', 'Karlsruhe', 'Stuttgart', 'Augsburg', 'Numberg', 'Munchen', 'Kassel'}
cc1: {'Kolkata', 'Bangalore', 'Delhi', 'Mumbai'}
cc2: {'ALB', 'NY', 'TX'}

Visualization

'spring layout': set k as the optimal distance between nodes. The higher the value for k, the larger the distance between nodes.

In [24]:
# set layout
pos = nx.spring_layout(g, k=5, seed=10)

# plot the network
nx.draw(g,pos,
        with_labels = True,  #labels nodes
        node_color='lightsteelblue',
        edge_color='slategrey')

# label edges (distance between cities)
edge_labels = nx.get_edge_attributes(g,'weight')
nx.draw_networkx_edge_labels(g,pos,edge_labels=edge_labels,
                            font_size=9,rotate=False)

plt.show()

Shortest path

e.g.: Google Maps, grocery shopping, LinkedIn connections

image.png

In [25]:
print(nx.shortest_path_length(g, 'Frankfurt','Stuttgart',weight='weight'))
print(nx.shortest_path(g, 'Frankfurt','Stuttgart',weight='weight'))
503
['Frankfurt', 'Wurzburg', 'Numberg', 'Stuttgart']

Pagerank

This algorithm measures node importance based on the number and quality of its (incoming and outgoing) links. Its use cases include:

  • Ranking of websites, tweets, or Facebook users
  • Most influential papers based on citations
  • Central actors in organized criminal networks

Sample graph showing Pageranking within a Facebook user network

image.png

3. Centrality measures

NetworkX covers various centrality measures, such as:

Degree Centrality.

Closeness Centrality.

Betweenness Centrality.

Sample graph showing betweenness centrality within a Facebook user network

image.png